ACGNet: Action Complement Graph Network for Weakly-Supervised Temporal Action Localization
نویسندگان
چکیده
Weakly-supervised temporal action localization (WTAL) in untrimmed videos has emerged as a practical but challenging task since only video-level labels are available. Existing approaches typically leverage off-the-shelf segment-level features, which suffer from spatial incompleteness and incoherence, thus limiting their performance. In this paper, we tackle problem new perspective by enhancing representations with simple yet effective graph convolutional network, namely complement network (ACGNet). It facilitates the current video segment to perceive spatial-temporal dependencies others that potentially convey complementary clues, implicitly mitigating negative effects caused two issues above. By means, features more discriminative robust variations, contributing higher accuracies. More importantly, proposed ACGNet works universal module can be flexibly plugged into different WTAL frameworks, while maintaining end-to-end training fashion. Extensive experiments conducted on THUMOS'14 ActivityNet1.2 benchmarks, where state-of-the-art results clearly demonstrate superiority of approach.
منابع مشابه
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video usin...
متن کاملTowards Weakly-Supervised Action Localization
This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...
متن کاملWeakly Supervised Action Detection
Detection of human action in videos has many applications such as video surveillance and content based video retrieval. Actions can be considered as spatio-temporal objects corresponding to spatio-temporal volumes in a video. The problem of action detection can thus be solved similarly to object detection in 2D images [3] where typically an object classifier is trained using positive and negati...
متن کاملConnectionist Temporal Modeling for Weakly Supervised Action Labeling
We propose a weakly-supervised framework for action labeling in video, where only the order of occurring actions is required during training time. The key challenge is that the per-frame alignments between the input (video) and label (action) sequences are unknown during training. We address this by introducing the Extended Connectionist Temporal Classification (ECTC) framework to efficiently e...
متن کاملAction Recognition by Weakly-Supervised Discriminative Region Localization
We present a novel probabilistic model for recognizing actions by identifying and extracting information from discriminative regions in videos. The model is trained in a weakly-supervised manner: training videos are annotated only with training label without any action location information within the video. Additionally, we eliminate the need for any pre-processing measures to help shortlist ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i3.20216